Knowledge Transfer in Markov Decision Processes
نویسندگان
چکیده
Markov Decision Processes (MDPs) are an effective way to formulate many problems in Machine Learning. However, learning the optimal policy for an MDP can be a time-consuming process, especially when nothing is known about the policy to begin with. An alternative approach is to find a similar MDP, for which an optimal policy is known, and modify this policy as needed. We present a framework for measuring the quality of knowledge transfer when transferring policies from one MDP to another. Our formulation is based upon the use of MDP bisimulation metrics, which provide a stable quantitative notion of state similarity for MDPs. Given two MDPs and a state mapping from the first to the second, a policy defined on the latter naturally induces a policy on the former. We provide a bound on the value function of the induced policy, showing that if the two MDPs are behaviorally close in terms of bisimulation distance and the original policy is close to optimal then the induced policy is guaranteed to be close to optimal as well. We also present some experiments in which simple MDPs are used to test the tightness of the bound provided by the bisimulation distance. In light of the results of these experiments, we suggest a new similarity measure.
منابع مشابه
Accelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملMeasuring the Distance Between Finite Markov Decision Processes
Markov decision processes (MDPs) have been studied for many decades. Recent research in using transfer learning methods to solve MDPs has shown that knowledge learned from one MDP may be used to solve a similar MDP better. In this paper, we propose two metrics for measuring the distance between finite MDPs. Our metrics are based on the Hausdorff metric which measures the distance between two su...
متن کاملUsing bisimulation for policy transfer in MDPs
Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and use it in a different environment, provided the two are ”close enough”. In this paper, we use bisimulation-style metrics (Ferns et al., 2004) to guide knowledge transfer. We propose algorithms that decide what actions...
متن کاملRelational Markov Decision Processes: Promise and Prospects
Relational Markov Decision Processes (RMDPs) offer an elegant formalism that combines probabilistic and relational knowledge representations with the decisiontheoretic notions of action and utility. In this paper we motivate RMDPs to address a variety of problems in AI, including open world planning, transfer learning, and relational inference. We describe a symbolic dynamic programming approac...
متن کاملTransfer via soft homomorphisms
The field of transfer learning aims to speed up learning across multiple related tasks by transferring knowledge between source and target tasks. Past work has shown that when the tasks are specified as Markov Decision Processes (MDPs), a function that maps states in the target task to similar states in the source task can be used to transfer many types of knowledge. Current approaches for auto...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006